CRM analytics are data that demonstrate your company’s sales and customer service performance. CRM analytics also presents customer data that you can use to inform smarter business decisions. Typically, you’ll use CRM software to obtain CRM analytics and automate all your data collection and report generation.
The primary benefit of CRM analysis is that you can use it to inform your sales, customer service and marketing processes. You can use your CRM analytics to improve your methods via:
Customer service evaluations. CRM analytics fill you in on your customer service team’s performance. If you see figures that your team could improve, implement practices that push your team toward these goals.
Accurate customer data. Whether you’re using your customer data for demographic marketing or email marketing, you need to know whether you’re reaching the right person. CRM analysis ensures you’re doing just that.
Thorough customer analytics. How much does your customer usually spend per quarter with you? Are they buying the same products time and time again, or does it vary? With CRM analytics, you’ll get firm answers to these questions, and you can use what you learn to refine your marketing strategies.
Efficient lead generation. Your CRM analysis can tell you which of your marketing efforts most strongly correlate to purchases. If you see one approach correlating strongly to purchases but have only targeted a sliver of your customers with that approach, try that method more – your sales might increase.
pip install black
Requirement already satisfied: black in c:\users\jki\anaconda3\lib\site-packages (0.0) Requirement already satisfied: click>=8.0.0 in c:\users\jki\anaconda3\lib\site-packages (from black) (8.0.4) Requirement already satisfied: mypy-extensions>=0.4.3 in c:\users\jki\anaconda3\lib\site-packages (from black) (0.4.3) Requirement already satisfied: packaging>=22.0 in c:\users\jki\anaconda3\lib\site-packages (from black) (23.0) Requirement already satisfied: pathspec>=0.9.0 in c:\users\jki\anaconda3\lib\site-packages (from black) (0.10.3) Requirement already satisfied: platformdirs>=2 in c:\users\jki\anaconda3\lib\site-packages (from black) (2.5.2) Requirement already satisfied: colorama in c:\users\jki\anaconda3\lib\site-packages (from click>=8.0.0->black) (0.4.6) Note: you may need to restart the kernel to use updated packages.
import os
import datetime
import squarify
import warnings
import pandas as pd
import numpy as np
import datetime as dt
from operator import attrgetter
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import plotly.graph_objs as go
from plotly.offline import iplot
from sklearn.metrics import (silhouette_score,
calinski_harabasz_score,
davies_bouldin_score)
from lifetimes import BetaGeoFitter, GammaGammaFitter
from lifetimes.plotting import plot_period_transactions
%matplotlib inline
%load_ext nb_black
warnings.filterwarnings('ignore')
sns.set_style('whitegrid')
palette = 'Set2'
--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) Cell In[12], line 20 18 from lifetimes.plotting import plot_period_transactions 19 get_ipython().run_line_magic('matplotlib', 'inline') ---> 20 get_ipython().run_line_magic('load_ext', 'nb_black') 21 warnings.filterwarnings('ignore') 22 sns.set_style('whitegrid') File ~\anaconda3\Lib\site-packages\IPython\core\interactiveshell.py:2414, in InteractiveShell.run_line_magic(self, magic_name, line, _stack_depth) 2412 kwargs['local_ns'] = self.get_local_scope(stack_depth) 2413 with self.builtin_trap: -> 2414 result = fn(*args, **kwargs) 2416 # The code below prevents the output from being displayed 2417 # when using magics with decodator @output_can_be_silenced 2418 # when the last Python token in the expression is a ';'. 2419 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False): File ~\anaconda3\Lib\site-packages\IPython\core\magics\extension.py:33, in ExtensionMagics.load_ext(self, module_str) 31 if not module_str: 32 raise UsageError('Missing module name.') ---> 33 res = self.shell.extension_manager.load_extension(module_str) 35 if res == 'already loaded': 36 print("The %s extension is already loaded. To reload it, use:" % module_str) File ~\anaconda3\Lib\site-packages\IPython\core\extensions.py:76, in ExtensionManager.load_extension(self, module_str) 69 """Load an IPython extension by its module name. 70 71 Returns the string "already loaded" if the extension is already loaded, 72 "no load function" if the module doesn't have a load_ipython_extension 73 function, or None if it succeeded. 74 """ 75 try: ---> 76 return self._load_extension(module_str) 77 except ModuleNotFoundError: 78 if module_str in BUILTINS_EXTS: File ~\anaconda3\Lib\site-packages\IPython\core\extensions.py:91, in ExtensionManager._load_extension(self, module_str) 89 with self.shell.builtin_trap: 90 if module_str not in sys.modules: ---> 91 mod = import_module(module_str) 92 mod = sys.modules[module_str] 93 if self._call_load_ipython_extension(mod): File ~\anaconda3\Lib\importlib\__init__.py:126, in import_module(name, package) 124 break 125 level += 1 --> 126 return _bootstrap._gcd_import(name[level:], package, level) File <frozen importlib._bootstrap>:1204, in _gcd_import(name, package, level) File <frozen importlib._bootstrap>:1176, in _find_and_load(name, import_) File <frozen importlib._bootstrap>:1140, in _find_and_load_unlocked(name, import_) ModuleNotFoundError: No module named 'nb_black'
import pandas as pd
# Try reading the CSV file with different encodings
try:
crm_data_df = pd.read_csv(r"C:\Users\jki\Downloads\crm data.csv", encoding='utf-8')
except UnicodeDecodeError:
crm_data_df = pd.read_csv(r"C:\Users\jki\Downloads\crm data.csv", encoding='latin1')
crm_data_df.head(5)
| InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
|---|---|---|---|---|---|---|---|---|
| 0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 12/1/2010 8:26 | 2.55 | 17850.0 | United Kingdom |
| 1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 12/1/2010 8:26 | 3.39 | 17850.0 | United Kingdom |
| 2 | 536365 | 84406B | CREAM CUPID HEARTS COAT HANGER | 8 | 12/1/2010 8:26 | 2.75 | 17850.0 | United Kingdom |
| 3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 12/1/2010 8:26 | 3.39 | 17850.0 | United Kingdom |
| 4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 12/1/2010 8:26 | 3.39 | 17850.0 | United Kingdom |
Typically e-commerce datasets are proprietary and consequently hard to find among publicly available data. However, The UCI Machine Learning Repository has made this dataset containing actual transactions from 2010 and 2011. The dataset is maintained on their site, where it can be found by the title "Online Retail".
InvoiceNo: Invoice number that consists 6 digits. If this code starts with letter 'c', it indicates a cancellation.
StockCode: Product code that consists 5 digits.
Description: Product name.
Quantity: The quantities of each product per transaction.
InvoiceDate: Represents the day and time when each transaction was generated.
UnitPrice: Product price per unit.
CustomerID: Customer number that consists 5 digits. Each customer has a unique customer ID.
Country: Name of the country where each customer resides.
world_map = crm_data_df[['CustomerID', 'InvoiceNo', 'Country']
].groupby(['CustomerID', 'InvoiceNo', 'Country']
).count().reset_index(drop = False)
countries = world_map['Country'].value_counts()
data = dict(type='choropleth',
locations = countries.index,
locationmode = 'country names',
z = countries,
text = countries.index,
colorbar = {'title':'Orders'},
colorscale='Viridis',
reversescale = False)
layout = dict(title={'text': "Number of Orders by Countries",
'y':0.9,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'},
geo = dict(resolution = 50,
showocean = True,
oceancolor = "LightBlue",
showland = True,
landcolor = "whitesmoke",
showframe = True),
template = 'plotly_white',
height = 600,
width = 1000)
choromap = go.Figure(data = [data], layout = layout)
iplot(choromap, validate = False)
def desc_stats(dataframe):
desc_df = pd.DataFrame(index= dataframe.columns,
columns= dataframe.describe().T.columns,
data= dataframe.describe().T)
f,ax = plt.subplots(figsize=(10,
desc_df.shape[0] * 0.81))
sns.heatmap(desc_df,
annot = True,
cmap = "Greens",
fmt = '.2f',
ax = ax,
linecolor = 'white',
linewidths = 1.1,
cbar = False,
annot_kws = {"size": 12})
plt.xticks(size = 18)
plt.yticks(size = 14,
rotation = 0)
plt.title("Descriptive Statistics", size = 14)
plt.show()
desc_stats(crm_data_df.select_dtypes(include = [float, int]))
📌 It is clearly seems that there are ouliters in Quantity and UnitPrice that have to be handled
📌 There are negative values in UnitPrice and Quantity because of cancelled orders.
📌 Missing values in Customer ID and Description.
📌 Quantity and Unit Price should be multiplied in order to create Total Price.
def replace_with_thresholds(dataframe, variable, q1 = 0.25, q3 = 0.75):
'''
Detects outliers with IQR method and replaces with thresholds
'''
df_ = dataframe.copy()
quartile1 = df_[variable].quantile(q1)
quartile3 = df_[variable].quantile(q3)
iqr = quartile3 - quartile1
up_limit = quartile3 + 1.5 * iqr
low_limit = quartile1 - 1.5 * iqr
df_.loc[(df_[variable] < low_limit), variable] = low_limit
df_.loc[(df_[variable] > up_limit), variable] = up_limit
return df_
def ecommerce_preprocess(dataframe):
df_ = dataframe.copy()
#Missing Values
df_ = df_.dropna()
#Cancelled Orders & Quantity
df_ = df_[~df_['InvoiceNo'].str.contains('C', na = False)]
df_ = df_[df_['Quantity'] > 0]
#Replacing Outliers
df_ = replace_with_thresholds(df_, "Quantity", q1 = 0.01, q3 = 0.99)
df_ = replace_with_thresholds(df_, "UnitPrice", q1 = 0.01, q3 = 0.99)
#Total Price
df_["TotalPrice"] = df_["Quantity"] * df_["UnitPrice"]
return df_
crm_data_df= ecommerce_preprocess(crm_data_df)
desc_stats(crm_data_df.select_dtypes(include = [float, int]))
RFM stands for Recency, Frequency, and Monetary value, each corresponding to some key customer trait. These RFM metrics are important indicators of a customer’s behavior because frequency and monetary value affects a customer’s lifetime value, and recency affects retention, a measure of engagement.
RFM factors illustrate these facts:
The more recent the purchase, the more responsive the customer is to promotions The more frequently the customer buys, the more engaged and satisfied they are Monetary value differentiates heavy spenders from low-value purchasers
print(crm_data_df['InvoiceDate'].max())
9/9/2011 9:52
import pandas as pd
import datetime as dt
# Assuming 'InvoiceDate' is a string column representing dates
crm_data_df['InvoiceDate'] = pd.to_datetime(crm_data_df['InvoiceDate'])
today_date = dt.datetime(2011, 9, 9)
# Drop rows with NaN values
crm_data_df.dropna(subset=['InvoiceDate', 'InvoiceNo', 'TotalPrice'], inplace=True)
rfm = crm_data_df.groupby('CustomerID').agg({'InvoiceDate': lambda x: (today_date - x.max()).days,
'InvoiceNo': lambda x: x.nunique(),
'TotalPrice': lambda x: x.sum()})
rfm.columns = ['recency', 'frequency', 'monetary']
# Filter 'monetary' column based on the condition
rfm['monetary'] = rfm['monetary'][rfm['monetary'] > 0]
rfm = rfm.reset_index()
rfm.head()
| CustomerID | recency | frequency | monetary | |
|---|---|---|---|---|
| 0 | 12346.0 | 233 | 1 | 310.44 |
| 1 | 12347.0 | -90 | 7 | 4310.00 |
| 2 | 12348.0 | -17 | 4 | 1770.78 |
| 3 | 12349.0 | -74 | 1 | 1491.72 |
| 4 | 12350.0 | 218 | 1 | 331.46 |
def get_rfm_scores(dataframe) -> pd.core.frame.DataFrame:
df_ = dataframe.copy()
df_["recency_score"] = pd.qcut(df_["recency"], 5, labels=[5, 4, 3, 2, 1])
df_["frequency_score"] = pd.qcut(
df_["frequency"].rank(method="first"), 5, labels=[1, 2, 3, 4, 5]
)
df_["monetary_score"] = pd.qcut(df_["monetary"], 5, labels=[1, 2, 3, 4, 5])
df_["RFM_SCORE"] = df_["recency_score"].astype(str) + df_["frequency_score"].astype(
str
)
return df_
rfm = get_rfm_scores(rfm)
seg_map = {r'[1-2][1-2]': 'hibernating',
r'[1-2][3-4]': 'at_Risk',
r'[1-2]5': 'cant_loose',
r'3[1-2]': 'about_to_sleep',
r'33': 'need_attention',
r'[3-4][4-5]': 'loyal_customers',
r'41': 'promising',
r'51': 'new_customers',
r'[4-5][2-3]': 'potential_loyalists',
r'5[4-5]': 'champions'}
rfm['segment'] = rfm['RFM_SCORE'].replace(seg_map, regex = True)
rfm.head()
| CustomerID | recency | frequency | monetary | recency_score | frequency_score | monetary_score | RFM_SCORE | segment | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 12346.0 | 233 | 1 | 310.44 | 1 | 1 | 2 | 11 | hibernating |
| 1 | 12347.0 | -90 | 7 | 4310.00 | 5 | 5 | 5 | 55 | champions |
| 2 | 12348.0 | -17 | 4 | 1770.78 | 2 | 4 | 4 | 24 | at_Risk |
| 3 | 12349.0 | -74 | 1 | 1491.72 | 4 | 1 | 4 | 41 | promising |
| 4 | 12350.0 | 218 | 1 | 331.46 | 1 | 1 | 2 | 11 | hibernating |
segments = rfm["segment"].value_counts().sort_values(ascending=False)
fig = plt.gcf()
ax = fig.add_subplot()
fig.set_size_inches(16, 10)
squarify.plot(
sizes=segments,
label=[label for label in seg_map.values()],
color=[
"#AFB6B5",
"#F0819A",
"#926717",
"#F0F081",
"#81D5F0",
"#C78BE5",
"#748E80",
"#FAAF3A",
"#7B8FE4",
"#86E8C0",
],
pad=False,
bar_kwargs={"alpha": 1},
text_kwargs={"fontsize": 15},
)
plt.title("Customer Segmentation Map", fontsize=20)
plt.xlabel("Frequency", fontsize=18)
plt.ylabel("Recency", fontsize=18)
plt.show()
print(' RFM Model Evaluation '.center(70, '='))
X = rfm[['recency_score', 'frequency_score']]
labels = rfm['segment']
print(f'Number of Observations: {X.shape[0]}')
print(f'Number of Segments: {labels.nunique()}')
print(f'Silhouette Score: {round(silhouette_score(X, labels), 3)}')
print(f'Calinski Harabasz Score: {round(calinski_harabasz_score(X, labels), 3)}')
print(f'Davies Bouldin Score: {round(davies_bouldin_score(X, labels), 3)} \n{70*"="}')
======================== RFM Model Evaluation ======================== Number of Observations: 4339 Number of Segments: 10 Silhouette Score: 0.471 Calinski Harabasz Score: 4591.503 Davies Bouldin Score: 0.633 ======================================================================
rfm[['recency','monetary','frequency','segment']]\
.groupby('segment')\
.agg({'mean','std','max','min'})
| recency | monetary | frequency | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| min | std | max | mean | min | std | max | mean | min | std | max | mean | |
| segment | ||||||||||||
| about_to_sleep | -59 | 10.943474 | -21 | -39.687500 | 6.20 | 533.161343 | 6207.670 | 469.058097 | 1 | 0.368913 | 2 | 1.161932 |
| at_Risk | -20 | 68.618828 | 281 | 60.785835 | 52.00 | 943.170824 | 11072.670 | 938.458341 | 2 | 0.954415 | 6 | 2.878583 |
| cant_loose | -20 | 65.250378 | 280 | 39.968254 | 70.02 | 2008.352354 | 10254.180 | 2646.822540 | 6 | 4.289630 | 34 | 8.380952 |
| champions | -92 | 3.683300 | -80 | -86.638231 | 198.23 | 18366.105822 | 266163.525 | 6498.612978 | 3 | 16.476484 | 210 | 12.417062 |
| hibernating | -20 | 92.013560 | 281 | 124.605042 | 3.75 | 553.714975 | 9182.345 | 398.573036 | 1 | 0.302492 | 2 | 1.101774 |
| loyal_customers | -78 | 15.577050 | -21 | -59.391941 | 36.56 | 5405.787839 | 120210.355 | 2752.519574 | 3 | 4.545669 | 63 | 6.479853 |
| need_attention | -59 | 11.552074 | -21 | -40.572193 | 6.90 | 732.298552 | 4353.830 | 847.657086 | 2 | 0.470081 | 3 | 2.326203 |
| new_customers | -92 | 3.902112 | -80 | -85.571429 | 89.94 | 174.620952 | 848.550 | 314.883690 | 1 | 0.000000 | 1 | 1.000000 |
| potential_loyalists | -92 | 9.338808 | -60 | -75.601240 | 20.80 | 601.746293 | 4628.750 | 674.628357 | 1 | 0.651401 | 3 | 2.010331 |
| promising | -78 | 5.238115 | -60 | -69.578947 | 30.00 | 223.634736 | 1491.720 | 285.623723 | 1 | 0.000000 | 1 | 1.000000 |
import seaborn as sns
import matplotlib.pyplot as plt
# Define a palette
palette = sns.color_palette("pastel")
# Your existing code
plt.figure(figsize=(18, 8))
ax = sns.countplot(data=rfm,
x='segment',
palette=palette)
total = len(rfm.segment)
for patch in ax.patches:
percentage = '{:.1f}%'.format(100 * patch.get_height()/total)
x = patch.get_x() + patch.get_width() / 2 - 0.17
y = patch.get_y() + patch.get_height() * 1.005
ax.annotate(percentage, (x, y), size=14)
plt.title('Number of Customers by Segments', size=16)
plt.xlabel('Segment', size=14)
plt.ylabel('Count', size=14)
plt.xticks(size=10)
plt.yticks(size=10)
plt.show()
plt.figure(figsize=(18, 8))
sns.scatterplot(
data=rfm, x="recency", y="frequency", hue="segment", palette=palette, s=60
)
plt.title("Recency & Frequency by Segments", size=16)
plt.xlabel("Recency", size=12)
plt.ylabel("Frequency", size=12)
plt.xticks(size=10)
plt.yticks(size=10)
plt.legend(loc="best", fontsize=12, title="Segments", title_fontsize=14)
plt.show()
fig, axes = plt.subplots(1, 3, figsize=(18, 8))
fig.suptitle("RFM Segment Analysis", size=14)
feature_list = ["recency", "monetary", "frequency"]
for idx, col in enumerate(feature_list):
sns.boxplot(
ax=axes[idx], data=rfm, x="segment", y=feature_list[idx], palette=palette
)
axes[idx].set_xticklabels(axes[idx].get_xticklabels(), rotation=60)
if idx == 1:
axes[idx].set_ylim([0, 400])
if idx == 2:
axes[idx].set_ylim([0, 30])
plt.tight_layout()
plt.show()
fig, axes = plt.subplots(3, 1, figsize=(16, 12))
fig.suptitle('RFM Segment Analysis', size = 14)
feature_list = ['recency', 'monetary', 'frequency']
for idx, col in enumerate(feature_list):
sns.histplot(ax = axes[idx], data = rfm,
hue = 'segment', x = feature_list[idx],
palette= palette)
if idx == 1:
axes[idx].set_xlim([0, 400])
if idx == 2:
axes[idx].set_xlim([0, 30])
plt.tight_layout()
plt.show()
C:\Users\jki\AppData\Local\Temp\ipykernel_16348\1358203543.py:12: UserWarning: Creating legend with loc="best" can be slow with large amounts of data. C:\Users\jki\anaconda3\Lib\site-packages\IPython\core\pylabtools.py:152: UserWarning: Creating legend with loc="best" can be slow with large amounts of data.
A cohort is a group of people sharing something in common, such as the sign-up date to an app, the month of the first purchase, geographical location, acquisition channel (organic users, coming from performance marketing, etc.) and so on. In Cohort Analysis, we track these groups of users over time, to identify some common patterns or behaviors
def CohortAnalysis(dataframe):
data = dataframe.copy()
data = data[["CustomerID", "InvoiceNo", "InvoiceDate"]].drop_duplicates()
data["order_month"] = data["InvoiceDate"].dt.to_period("M")
data["cohort"] = (
data.groupby("CustomerID")["InvoiceDate"].transform("min").dt.to_period("M")
)
cohort_data = (
data.groupby(["cohort", "order_month"])
.agg(n_customers=("CustomerID", "nunique"))
.reset_index(drop=False)
)
cohort_data["period_number"] = (cohort_data.order_month - cohort_data.cohort).apply(
attrgetter("n")
)
cohort_pivot = cohort_data.pivot_table(
index="cohort", columns="period_number", values="n_customers"
)
cohort_size = cohort_pivot.iloc[:, 0]
retention_matrix = cohort_pivot.divide(cohort_size, axis=0)
with sns.axes_style("white"):
fig, ax = plt.subplots(
1, 2, figsize=(12, 8), sharey=True, gridspec_kw={"width_ratios": [1, 11]}
)
sns.heatmap(
retention_matrix,
mask=retention_matrix.isnull(),
annot=True,
cbar=False,
fmt=".0%",
cmap="coolwarm",
ax=ax[1],
)
ax[1].set_title("Monthly Cohorts: User Retention", fontsize=14)
ax[1].set(xlabel="# of periods", ylabel="")
white_cmap = mcolors.ListedColormap(["white"])
sns.heatmap(
pd.DataFrame(cohort_size).rename(columns={0: "cohort_size"}),
annot=True,
cbar=False,
fmt="g",
cmap=white_cmap,
ax=ax[0],
)
fig.tight_layout()
CohortAnalysis(crm_data_df)
Customer lifetime value is how much money a customer will bring your brand throughout their entire time as a paying customer.
cltv_df = crm_data_df.groupby("CustomerID").agg(
{
"InvoiceDate": [
lambda x: (x.max() - x.min()).days,
lambda x: (today_date - x.min()).days,
],
"InvoiceNo": "nunique",
"TotalPrice": "sum",
}
)
# Droplevel and rename columns
cltv_df.columns = cltv_df.columns.droplevel(0)
cltv_df.columns = ["recency", "T", "frequency", "monetary"]
cltv_df.head()
| recency | T | frequency | monetary | |
|---|---|---|---|---|
| CustomerID | ||||
| 12346.0 | 0 | 233 | 1 | 310.44 |
| 12347.0 | 365 | 275 | 7 | 4310.00 |
| 12348.0 | 282 | 266 | 4 | 1770.78 |
| 12349.0 | 0 | -74 | 1 | 1491.72 |
| 12350.0 | 0 | 218 | 1 | 331.46 |
#Average Order Value
cltv_df["monetary"] = cltv_df["monetary"] / cltv_df["frequency"]
#Recency & Tenure
cltv_df["recency"] = cltv_df["recency"] / 7
cltv_df["T"] = cltv_df["T"] / 7
#Frequency
cltv_df = cltv_df[(cltv_df['frequency'] > 1)]